35 research outputs found

    Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers

    Get PDF
    Online Multi-Object Tracking (MOT) from videos is a challenging computer vision task which has been extensively studied for decades. Most of the existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm combined with popular machine learning approaches which largely reduce the human effort to tune algorithm parameters. However, the commonly used supervised learning approaches require the labeled data (e.g., bounding boxes), which is expensive for videos. Also, the TBD framework is usually suboptimal since it is not end-to-end, i.e., it considers the task as detection and tracking, but not jointly. To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. Learning is then driven by the reconstruction error through backpropagation. We further propose a Reprioritized Attentive Tracking to improve the robustness of data association. Experiments conducted on both synthetic and real video datasets show the potential of the proposed model. Our project page is publicly available at: https://github.com/zhen-he/tracking-by-animationComment: CVPR 201

    Optimal O

    Get PDF
    A number of acceleration schemes for speeding up the time-consuming bilateral filter have been proposed in the literature. Among these techniques, the histogram-based bilateral filter trades the flexibility for achieving O(1) computational complexity using box spatial kernel. A recent study shows that this technique can be leveraged for O(1) bilateral filter with arbitrary spatial and range kernels by linearly combining the results of multiple-box bilateral filters. However, this method requires many box bilateral filters to obtain sufficient accuracy when approximating the bilateral filter with a large spatial kernel. In this paper, we propose approximating arbitrary spatial kernel using a fixed number of boxes. It turns out that the multiple-box spatial kernel can be applied in many O(1) acceleration schemes in addition to the histogram-based one. Experiments on the application to the histogram-based acceleration are presented in this paper. Results show that the proposed method has better accuracy in approximating the bilateral filter with Gaussian spatial kernel, compared with the previous histogram-based methods. Furthermore, the performance of the proposed histogram-based bilateral filter is robust with respect to the parameters of the filter kernel

    Multimedia Data Modelling Using Multidimensional Recurrent Neural Networks

    No full text
    Modelling the multimedia data such as text, images, or videos usually involves the analysis, prediction, or reconstruction of them. The recurrent neural network (RNN) is a powerful machine learning approach to modelling these data in a recursive way. As a variant, the long short-term memory (LSTM) extends the RNN with the ability to remember information for longer. Whilst one can increase the capacity of LSTM by widening or adding layers, additional parameters and runtime are usually required, which could make learning harder. We therefore propose a Tensor LSTM where the hidden states are tensorised as multidimensional arrays (tensors) and updated through a cross-layer convolution. As parameters are spatially shared within the tensor, we can efficiently widen the model without extra parameters by increasing the tensorised size; as deep computations of each time step are absorbed by temporal computations of the time series, we can implicitly deepen the model with little extra runtime by delaying the output. We show by experiments that our model is well-suited for various multimedia data modelling tasks, including text generation, text calculation, image classification, and video prediction

    Vision Sensor-Based Road Detection for Field Robot Navigation

    No full text
    Road detection is an essential component of field robot navigation systems. Vision sensors play an important role in road detection for their great potential in environmental perception. In this paper, we propose a hierarchical vision sensor-based method for robust road detection in challenging road scenes. More specifically, for a given road image captured by an on-board vision sensor, we introduce a multiple population genetic algorithm (MPGA)-based approach for efficient road vanishing point detection. Superpixel-level seeds are then selected in an unsupervised way using a clustering strategy. Then, according to the GrowCut framework, the seeds proliferate and iteratively try to occupy their neighbors. After convergence, the initial road segment is obtained. Finally, in order to achieve a globally-consistent road segment, the initial road segment is refined using the conditional random field (CRF) framework, which integrates high-level information into road detection. We perform several experiments to evaluate the common performance, scale sensitivity and noise sensitivity of the proposed method. The experimental results demonstrate that the proposed method exhibits high robustness compared to the state of the art
    corecore